Will a Customer Accept the Coupon?¶

Context

Imagine driving through town and a coupon is delivered to your cell phone for a restaraunt near where you are driving. Would you accept that coupon and take a short detour to the restaraunt? Would you accept the coupon but use it on a sunbsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaraunt? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

Overview

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

Data

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’. There are five different types of coupons -- less expensive restaurants (under $20), coffee houses, carry out & take away, bar, and more expensive restaurants (\$20 - $50).

Deliverables

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons. To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece.

Data Description¶

Keep in mind that these values mentioned below are average values.

The attributes of this data set include:

  1. User attributes
    • Gender: male, female
    • Age: below 21, 21 to 25, 26 to 30, etc.
    • Marital Status: single, married partner, unmarried partner, or widowed
    • Number of children: 0, 1, or more than 1
    • Education: high school, bachelors degree, associates degree, or graduate degree
    • Occupation: architecture & engineering, business & financial, etc.
    • Annual income: less than $12500, \$12500 - $24999, \$25000 - $37499, etc.
    • Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    • Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    • Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    • Number of times that he/she eats at a restaurant with average expense less than $20 per person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    • Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
  1. Contextual attributes
    • Driving destination: home, work, or no urgent destination
    • Location of user, coupon and destination: we provide a map to show the geographical location of the user, destination, and the venue, and we mark the distance between each two places with time of driving. The user can see whether the venue is in the same direction as the destination.
    • Weather: sunny, rainy, or snowy
    • Temperature: 30F, 55F, or 80F
    • Time: 10AM, 2PM, or 6PM
    • Passenger: alone, partner, kid(s), or friend(s)
  1. Coupon attributes
    • time before it expires: 2 hours or one day
In [1]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go

Load Data¶

1.Read in the coupons.csv file into a data frame.

In [2]:
df = pd.read_csv('data/coupons.csv')
In [3]:
df.head()
Out[3]:
destination passanger weather temperature time coupon expiration gender age maritalStatus ... CoffeeHouse CarryAway RestaurantLessThan20 Restaurant20To50 toCoupon_GEQ5min toCoupon_GEQ15min toCoupon_GEQ25min direction_same direction_opp Y
0 No Urgent Place Alone Sunny 55 2PM Restaurant(<20) 1d Female 21 Unmarried partner ... never NaN 4~8 1~3 1 0 0 0 1 1
1 No Urgent Place Friend(s) Sunny 80 10AM Coffee House 2h Female 21 Unmarried partner ... never NaN 4~8 1~3 1 0 0 0 1 0
2 No Urgent Place Friend(s) Sunny 80 10AM Carry out & Take away 2h Female 21 Unmarried partner ... never NaN 4~8 1~3 1 1 0 0 1 1
3 No Urgent Place Friend(s) Sunny 80 2PM Coffee House 2h Female 21 Unmarried partner ... never NaN 4~8 1~3 1 1 0 0 1 0
4 No Urgent Place Friend(s) Sunny 80 2PM Coffee House 1d Female 21 Unmarried partner ... never NaN 4~8 1~3 1 1 0 0 1 0

5 rows × 26 columns

In [4]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12684 entries, 0 to 12683
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   destination           12684 non-null  object
 1   passanger             12684 non-null  object
 2   weather               12684 non-null  object
 3   temperature           12684 non-null  int64 
 4   time                  12684 non-null  object
 5   coupon                12684 non-null  object
 6   expiration            12684 non-null  object
 7   gender                12684 non-null  object
 8   age                   12684 non-null  object
 9   maritalStatus         12684 non-null  object
 10  has_children          12684 non-null  int64 
 11  education             12684 non-null  object
 12  occupation            12684 non-null  object
 13  income                12684 non-null  object
 14  car                   108 non-null    object
 15  Bar                   12577 non-null  object
 16  CoffeeHouse           12467 non-null  object
 17  CarryAway             12533 non-null  object
 18  RestaurantLessThan20  12554 non-null  object
 19  Restaurant20To50      12495 non-null  object
 20  toCoupon_GEQ5min      12684 non-null  int64 
 21  toCoupon_GEQ15min     12684 non-null  int64 
 22  toCoupon_GEQ25min     12684 non-null  int64 
 23  direction_same        12684 non-null  int64 
 24  direction_opp         12684 non-null  int64 
 25  Y                     12684 non-null  int64 
dtypes: int64(8), object(18)
memory usage: 2.5+ MB

There is a typo in passenger column name, lets fix that

In [5]:
df.rename(columns={"passanger": "passenger"}, inplace=True)

There are a few columns with name in camel case or starting with upper case. Lets change those to lowercase and with and underscore whereever required so that the column names are more consistent and easier to work with.

In [6]:
df = df.rename(columns={"maritalStatus": "marital_status", 
                    "Bar" : "bar",
                    "CoffeeHouse": "coffee_house",
                    "CarryAway":"takeaway",
                    "RestaurantLessThan20" : "restaurant_lt_20" ,
                    "Restaurant20To50" : "restaurant_gt_20",
                    "Y" : "coupon_accepted"})

Check for missing Data and Fill/Drop them accordingly¶

1.Investigate the dataset for missing or problematic data.

In [7]:
#Find columns with null values and count of values missing
empty_columns = df.columns[df.isna().sum() > 0]
df[empty_columns].isnull().sum()
Out[7]:
car                 12576
bar                   107
coffee_house          217
takeaway              151
restaurant_lt_20      130
restaurant_gt_20      189
dtype: int64
In [8]:
#Explore columns with missing data
print(df['car'].value_counts())
print(df['bar'].value_counts())
print(df['coffee_house'].value_counts())
print(df['takeaway'].value_counts())
print(df['restaurant_lt_20'].value_counts())
print(df['restaurant_gt_20'].value_counts())
Scooter and motorcycle                      22
Mazda5                                      22
do not drive                                22
crossover                                   21
Car that is too old to install Onstar :D    21
Name: car, dtype: int64
never    5197
less1    3482
1~3      2473
4~8      1076
gt8       349
Name: bar, dtype: int64
less1    3385
1~3      3225
never    2962
4~8      1784
gt8      1111
Name: coffee_house, dtype: int64
1~3      4672
4~8      4258
less1    1856
gt8      1594
never     153
Name: takeaway, dtype: int64
1~3      5376
4~8      3580
less1    2093
gt8      1285
never     220
Name: restaurant_lt_20, dtype: int64
less1    6077
1~3      3290
never    2136
4~8       728
gt8       264
Name: restaurant_gt_20, dtype: int64
In [9]:
#the values in car do not seem to be of any use for this analysis. Drop the column
df = df.drop(columns=['car'])

#Check the dataframe when bar is null

df[pd.isnull(df['bar'])].iloc[0:,0:20].head()
Out[9]:
destination passenger weather temperature time coupon expiration gender age marital_status has_children education occupation income bar coffee_house takeaway restaurant_lt_20 restaurant_gt_20 toCoupon_GEQ5min
517 No Urgent Place Friend(s) Sunny 80 10AM Bar 1d Male 50plus Divorced 1 Some college - no degree Management 62500−62500−74999 NaN NaN NaN NaN NaN 1
518 No Urgent Place Friend(s) Sunny 80 2PM Coffee House 1d Male 50plus Divorced 1 Some college - no degree Management 62500−62500−74999 NaN NaN NaN NaN NaN 1
519 No Urgent Place Friend(s) Sunny 80 2PM Restaurant(<20) 1d Male 50plus Divorced 1 Some college - no degree Management 62500−62500−74999 NaN NaN NaN NaN NaN 1
520 No Urgent Place Friend(s) Sunny 80 6PM Coffee House 2h Male 50plus Divorced 1 Some college - no degree Management 62500−62500−74999 NaN NaN NaN NaN NaN 1
521 No Urgent Place Friend(s) Sunny 55 2PM Coffee House 2h Male 50plus Divorced 1 Some college - no degree Management 62500−62500−74999 NaN NaN NaN NaN NaN 1

Since bar column signifies the number of times a customer goes to a bar, we do not have a way to accurately fill in the missing data. Assuming the missing data as never can skew the data unfavorly. So we drop the rows where this data is missing. The same logic applies to other columns as well i.e. for coffee_house, takeaway ,restaurant_lt_20 ,restaurant_gt_20

In [10]:
df = df.dropna(subset=['bar','coffee_house','takeaway','restaurant_lt_20','restaurant_gt_20'])

#Verify that there are no more missing values in the dataframe
#Print all columns with nan values along with the count
print(df[df.columns[df.isna().sum() > 0]].isnull().sum())

#Check how many records were dropped in total
print('Dropped row count : ' + str(12684 - df['destination'].size))
Series([], dtype: float64)
Dropped row count : 605
  1. Explore the income column
In [11]:
df['income'].value_counts()
Out[11]:
$25000 - $37499     1919
$12500 - $24999     1728
$100000 or More     1692
$37500 - $49999     1689
$50000 - $62499     1565
Less than $12500    1014
$62500 - $74999      840
$87500 - $99999      818
$75000 - $87499      814
Name: income, dtype: int64

Lets make the ranges more uniform

In [12]:
df['income'] = df['income'].str.replace("$", "", regex=False)
df['income'].replace(to_replace='Less than 12500',value='0 - 12500',inplace=True)
df['income'].replace(to_replace='100000 or More',value='100000 - 200000',inplace=True)

df['income'].value_counts()
Out[12]:
25000 - 37499      1919
12500 - 24999      1728
100000 - 200000    1692
37500 - 49999      1689
50000 - 62499      1565
0 - 12500          1014
62500 - 74999       840
87500 - 99999       818
75000 - 87499       814
Name: income, dtype: int64
  1. Explore education column
In [13]:
df['education'].value_counts()
Out[13]:
Some college - no degree                  4219
Bachelors degree                          4021
Graduate degree (Masters or Doctorate)    1808
Associates degree                         1066
High School Graduate                       877
Some High School                            88
Name: education, dtype: int64

Some of these values can be merged into singular value:

  • For those with 'Some college - no degree' , the highest degree for such drivers is high school graduates
  • Same goes for Some High School
  • Associates degree is esentially a bachelors degree.
In [14]:
df['education'].replace(to_replace='Some college - no degree',value='High School Graduate',inplace=True)
df['education'].replace(to_replace='Associates degree',value='Bachelors degree',inplace=True)
df['education'].replace(to_replace='Some High School',value='High School Graduate',inplace=True)


df['education'].value_counts()
Out[14]:
High School Graduate                      5184
Bachelors degree                          5087
Graduate degree (Masters or Doctorate)    1808
Name: education, dtype: int64
  1. Lets update age column, Ideally age should be int but its object data type.
In [15]:
print(df['age'].value_counts())

"""
Lets change the strings below21 and 50plus to numbers so that we can treat age as a numeric value. For simplicity,
I am chaging below 21 to 16 and  50plus to 51
"""

df['age'].replace(to_replace='below21',value=16,inplace=True)
df['age'].replace(to_replace='50plus',value=51,inplace=True)

df['age'] = df['age'].astype(int)
df.info()
21         2537
26         2399
31         1925
50plus     1732
36         1253
41         1065
46          664
below21     504
Name: age, dtype: int64
<class 'pandas.core.frame.DataFrame'>
Int64Index: 12079 entries, 22 to 12683
Data columns (total 25 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   destination        12079 non-null  object
 1   passenger          12079 non-null  object
 2   weather            12079 non-null  object
 3   temperature        12079 non-null  int64 
 4   time               12079 non-null  object
 5   coupon             12079 non-null  object
 6   expiration         12079 non-null  object
 7   gender             12079 non-null  object
 8   age                12079 non-null  int64 
 9   marital_status     12079 non-null  object
 10  has_children       12079 non-null  int64 
 11  education          12079 non-null  object
 12  occupation         12079 non-null  object
 13  income             12079 non-null  object
 14  bar                12079 non-null  object
 15  coffee_house       12079 non-null  object
 16  takeaway           12079 non-null  object
 17  restaurant_lt_20   12079 non-null  object
 18  restaurant_gt_20   12079 non-null  object
 19  toCoupon_GEQ5min   12079 non-null  int64 
 20  toCoupon_GEQ15min  12079 non-null  int64 
 21  toCoupon_GEQ25min  12079 non-null  int64 
 22  direction_same     12079 non-null  int64 
 23  direction_opp      12079 non-null  int64 
 24  coupon_accepted    12079 non-null  int64 
dtypes: int64(9), object(16)
memory usage: 2.4+ MB

Lets explore direction columns

In [16]:
print(df['direction_opp'].value_counts())
print(df['direction_same'].value_counts())

#The values in the two column seems to be directly related to each other

print(df.query('direction_opp == 1')['direction_same'].value_counts())
df.query('direction_same == 1')['direction_opp'].value_counts()

#Lets club the two columns into one - direction. 
#A value of 0 represents going in the same direction whereas 1 will represnt opposite direction
df['direction'] = np.where(df['direction_same'] == 1, 0 , 1)

#verify the value in the new column
df

#Drop the two redundant columns
df = df.drop(columns=['direction_same', 'direction_opp'])
df
1    9480
0    2599
Name: direction_opp, dtype: int64
0    9480
1    2599
Name: direction_same, dtype: int64
0    9480
Name: direction_same, dtype: int64
Out[16]:
destination passenger weather temperature time coupon expiration gender age marital_status ... bar coffee_house takeaway restaurant_lt_20 restaurant_gt_20 toCoupon_GEQ5min toCoupon_GEQ15min toCoupon_GEQ25min coupon_accepted direction
22 No Urgent Place Alone Sunny 55 2PM Restaurant(<20) 1d Male 21 Single ... never less1 4~8 4~8 less1 1 0 0 1 1
23 No Urgent Place Friend(s) Sunny 80 10AM Coffee House 2h Male 21 Single ... never less1 4~8 4~8 less1 1 0 0 0 1
24 No Urgent Place Friend(s) Sunny 80 10AM Bar 1d Male 21 Single ... never less1 4~8 4~8 less1 1 0 0 1 1
25 No Urgent Place Friend(s) Sunny 80 10AM Carry out & Take away 2h Male 21 Single ... never less1 4~8 4~8 less1 1 1 0 0 1
26 No Urgent Place Friend(s) Sunny 80 2PM Coffee House 1d Male 21 Single ... never less1 4~8 4~8 less1 1 0 0 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12679 Home Partner Rainy 55 6PM Carry out & Take away 1d Male 26 Single ... never never 1~3 4~8 1~3 1 0 0 1 0
12680 Work Alone Rainy 55 7AM Carry out & Take away 1d Male 26 Single ... never never 1~3 4~8 1~3 1 0 0 1 1
12681 Work Alone Snowy 30 7AM Coffee House 1d Male 26 Single ... never never 1~3 4~8 1~3 1 0 0 0 0
12682 Work Alone Snowy 30 7AM Bar 1d Male 26 Single ... never never 1~3 4~8 1~3 1 1 1 0 1
12683 Work Alone Sunny 80 7AM Restaurant(20-50) 2h Male 26 Single ... never never 1~3 4~8 1~3 1 0 0 0 0

12079 rows × 24 columns

The coupon_accepted column has an int value of 0 and 1. lets add another column with yes/no so that its easier to read in charts

In [17]:
df['coupon_accepted_str'] = np.where(df['coupon_accepted'] == 0, 'No', 'Yes')

For the data pertaining the number of times a driver visit various establishments (1.e. bar, coffee shop etc.) , the values are a string defining range of days. Lets replace these values with a numeric value so that it is easier to analyze. Lets replace the values with the lower bound of the range i.e. the minimum number of days a driver visits the establishment per month

In [18]:
df.replace(to_replace='less1',value='Never', inplace=True)
df.replace(to_replace='never',value='Never', inplace=True)
df.replace(to_replace='gt8',value='8 or more', inplace=True)

For the columns toCoupon_GEQ5min, toCoupon_GEQ15min, toCoupon_GEQ25min, I want to analyze if we can combine these to a single column. These columns denote how far is the driver from the coupon place.

In [19]:
toCoupon_GEQ5min = df['toCoupon_GEQ5min'].value_counts()
print(f'toCoupon_GEQ5min \n {toCoupon_GEQ5min} \n')
toCoupon_GEQ15min = df['toCoupon_GEQ15min'].value_counts()
print(f'toCoupon_GEQ15min \n {toCoupon_GEQ15min} \n')
toCoupon_GEQ25min = df['toCoupon_GEQ25min'].value_counts()
print(f'toCoupon_GEQ25min \n {toCoupon_GEQ25min} \n')
print(df.shape)
toCoupon_GEQ5min 
 1    12079
Name: toCoupon_GEQ5min, dtype: int64 

toCoupon_GEQ15min 
 1    6779
0    5300
Name: toCoupon_GEQ15min, dtype: int64 

toCoupon_GEQ25min 
 0    10637
1     1442
Name: toCoupon_GEQ25min, dtype: int64 

(12079, 25)

All the rows have 1 for col toCoupon_GEQ5min. This means that all the locations are more than or equal to 5 mins away. 6779 rows are 15 mins away and 1442 are 25 mins away. How do we get the count of locations 5 mins away?

  • Out of 12079, 6779 places are 15 mins or more away. This mean that 12079 - 6779 = 5300 places are between 5 and 15 mins away.
  • Out of 6779, 1442 places are 25 mins or more away. This mean that 6779 - 1442 = 5337 places are between 15 and 25 mins away And 1442 places are more than 25 mins away

    Lets add a new column replacing these three that lists the distance. For simplicity, we will take the distance as 10 mins (for places 5-15 mins away), 20 for places 15-25 mins away and 30 for places more than 25 mins away

In [20]:
def get_distance(row):
    toCoupon_GEQ5min = row['toCoupon_GEQ5min']
    toCoupon_GEQ15min = row['toCoupon_GEQ15min']
    toCoupon_GEQ25min = row['toCoupon_GEQ25min']
    if (toCoupon_GEQ5min == 1) & (toCoupon_GEQ15min == 0):
        return 10
    if (toCoupon_GEQ15min == 1) & (toCoupon_GEQ25min == 1):
        return 30
    else:
        return 20
    
    
df['driving_time'] = df.apply(get_distance , axis = 1)
df['driving_time'].value_counts()

#drop redundant columns
df = df.drop(columns=['toCoupon_GEQ5min','toCoupon_GEQ15min' ,'toCoupon_GEQ25min'])

Lets look at the final data set

In [21]:
df
Out[21]:
destination passenger weather temperature time coupon expiration gender age marital_status ... income bar coffee_house takeaway restaurant_lt_20 restaurant_gt_20 coupon_accepted direction coupon_accepted_str driving_time
22 No Urgent Place Alone Sunny 55 2PM Restaurant(<20) 1d Male 21 Single ... 62500 - 74999 Never Never 4~8 4~8 Never 1 1 Yes 10
23 No Urgent Place Friend(s) Sunny 80 10AM Coffee House 2h Male 21 Single ... 62500 - 74999 Never Never 4~8 4~8 Never 0 1 No 10
24 No Urgent Place Friend(s) Sunny 80 10AM Bar 1d Male 21 Single ... 62500 - 74999 Never Never 4~8 4~8 Never 1 1 Yes 10
25 No Urgent Place Friend(s) Sunny 80 10AM Carry out & Take away 2h Male 21 Single ... 62500 - 74999 Never Never 4~8 4~8 Never 0 1 No 20
26 No Urgent Place Friend(s) Sunny 80 2PM Coffee House 1d Male 21 Single ... 62500 - 74999 Never Never 4~8 4~8 Never 0 1 No 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12679 Home Partner Rainy 55 6PM Carry out & Take away 1d Male 26 Single ... 75000 - 87499 Never Never 1~3 4~8 1~3 1 0 Yes 10
12680 Work Alone Rainy 55 7AM Carry out & Take away 1d Male 26 Single ... 75000 - 87499 Never Never 1~3 4~8 1~3 1 1 Yes 10
12681 Work Alone Snowy 30 7AM Coffee House 1d Male 26 Single ... 75000 - 87499 Never Never 1~3 4~8 1~3 0 0 No 10
12682 Work Alone Snowy 30 7AM Bar 1d Male 26 Single ... 75000 - 87499 Never Never 1~3 4~8 1~3 0 1 No 30
12683 Work Alone Sunny 80 7AM Restaurant(20-50) 2h Male 26 Single ... 75000 - 87499 Never Never 1~3 4~8 1~3 0 0 No 10

12079 rows × 23 columns

Exploratory Vizualizations¶

Lets explore the data set and see what the data looks like by vizualizing a few attributes

  1. What is the age distribution of the customers
In [22]:
#Plot a histogram for age 
fig = px.histogram(df, x='age', title='Age Distribution',  labels={'age':'Age'})
fig.update_layout( yaxis_title="Count")
fig.show()
152025303540455005001000150020002500
Age DistributionAgeCount
plotly-logomark
  1. Income Distribution of customers
In [23]:
#Income Distribution

fig = px.histogram(df['income'].astype('str'), x='income', title='Income Distribution').update_xaxes(categoryorder='total ascending')
fig.update_layout( xaxis_title="Income Range in $" ,yaxis_title="Count")
fig.update_xaxes(tickangle = 45,title_standoff = 100)

fig.show()
75000 - 8749987500 - 9999962500 - 749990 - 1250050000 - 6249937500 - 49999100000 - 20000012500 - 2499925000 - 374990500100015002000
Income DistributionIncome Range in $Count
plotly-logomark
  1. What proportion of the total observations chose to accept the coupon?
In [24]:
coupons_accepted = df.query('coupon_accepted_str == "Yes"')['coupon_accepted_str'].count()
total_coupons = df['coupon_accepted_str'].count()
coupons_accepted_pct = round(coupons_accepted/total_coupons*100,2)
print('Proportion to accept coupon : ' + str(coupons_accepted_pct) + '%')

fig = px.histogram(df['coupon_accepted_str'], x="coupon_accepted_str", histnorm='percent',
                   labels={'coupon_accepted_str':'Coupon Accepted'}, text_auto=True, title='Coupon Acceptance')
fig.update_layout( yaxis_title="Percentage")

fig.show()
Proportion to accept coupon : 56.93%
56.9335243.06648YesNo01020304050
Coupon AcceptanceCoupon AcceptedPercentage
plotly-logomark
  1. Use a bar plot to visualize the coupon column.
In [25]:
fig = px.histogram(df, x="coupon", color="coupon_accepted_str",
                   labels={'coupon':'Issued Coupons','coupon_accepted_str':'Coupon Accepted'},
                   title='Coupons Issued',
                    text_auto=True).update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Count")
fig.show()
63278816821881189478511255987721922Restaurant(20-50)BarCarry out & Take awayRestaurant(<20)Coffee House05001000150020002500300035004000
Coupon AcceptedYesNoCoupons IssuedIssued CouponsCount
plotly-logomark

Observations :

  • Most of the issued coupons are for coffee house, least coupons are issued for restuarants(20-50).
  • Most of the take away coupons have been accepted
  • Bar and Restaurant(20-50) has the least acceptance rate for the issued coupons
  1. Use a histogram to visualize the temperature column.
In [26]:
fig = px.histogram(df, x="temperature", color="coupon_accepted_str",
                   labels={'temperature':'Temperature','coupon_accepted_str':'Coupon Accepted'},
                   title='Temperature based acceptance',
                   category_orders={"temperature":[30,50,80]},
                    text_auto=True).update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Count",bargap=0.3)
fig.update_xaxes(type='category')
fig.show()
1179196737311016169524913055800100020003000400050006000
Coupon AcceptedYesNoTemperature based acceptanceTemperatureCount
plotly-logomark
  1. What occupations do the drivers have?
In [27]:
fig = px.histogram(df, x='occupation', title='Occupation Distribution',
            labels={'occupation':'Occupation'}).update_xaxes(categoryorder='total ascending')

fig.update_layout(yaxis_title='Count')
fig.update_xaxes(tickangle = 45)

fig.show()
Farming Fishing & ForestryBuilding & Grounds Cleaning & MaintenanceProduction OccupationsInstallation Maintenance & RepairPersonal Care & ServiceConstruction & ExtractionLife Physical Social ScienceArchitecture & EngineeringProtective ServiceTransportation & Material MovingLegalCommunity & Social ServicesHealthcare Practitioners & TechnicalHealthcare SupportFood Preparation & Serving RelatedRetiredBusiness & FinancialArts Design Entertainment Sports & MediaOffice & Administrative SupportManagementEducation&Training&LibrarySales & RelatedComputer & MathematicalStudentUnemployed050010001500
Occupation DistributionOccupationCount
plotly-logomark
  1. Where are the drivers headed?
In [28]:
fig = px.histogram(df, x='destination',color='passenger', title='Destination Distribution',
            labels={'destination':'Destination', 'passenger':'Passenger'}).update_xaxes(categoryorder='total ascending')

fig.update_layout(yaxis_title='Count')
fig.update_xaxes(tickangle = 45)

fig.show()
WorkHomeNo Urgent Place0100020003000400050006000
PassengerAloneFriend(s)Kid(s)PartnerDestination DistributionDestinationCount
plotly-logomark

Most drivers are headed to non urgent place with friends. This can be an interesting category to analyze

  1. Lets look at the driving time distribution
In [29]:
fig = px.histogram(df, x='driving_time',color='coupon', title='Driving time for Issued coupons',
                   nbins=3,
            labels={'driving_time':'Driving time in mins', 'coupon':'Issued Coupon'}).update_xaxes(categoryorder='total ascending')

fig.update_layout(yaxis_title='Count',bargap=0.3)
fig.update_xaxes(tickangle = 45)

fig.show()
101520253035010002000300040005000
Issued CouponRestaurant(<20)Coffee HouseBarCarry out & Take awayRestaurant(20-50)Driving time for Issued couponsDriving time in minsCount
plotly-logomark

Utility Method

Adding a method that can be reused in the analysis below. This method is used to compute the rate of acceptance based on the data frame and the grouping columns passed

In [30]:
def get_acceptance_pct(df, cols):
    coupons_accepted = df.groupby(cols)[["coupon_accepted"]].sum()
    coupons_issued = df.groupby(cols)[["coupon_accepted"]].count()
    coupons_accepted_pct = round(coupons_accepted/coupons_issued*100,2)
    coupons_accepted_pct['coupon_issued'] = coupons_issued['coupon_accepted']
    coupons_accepted_pct.rename(columns={"coupon_accepted": "acceptance_pct"}, inplace=True) #rename to %
    coupons_accepted_pct['coupon_accepted'] = coupons_accepted['coupon_accepted'] # this is the actual accepted count

    return coupons_accepted_pct

Investigating the Bar Coupons¶

Let us explore just the bar related coupons.

  1. Create a new DataFrame that contains just the bar coupons.
In [31]:
df_bar = df.query('coupon == "Bar"')
df_bar.head()
Out[31]:
destination passenger weather temperature time coupon expiration gender age marital_status ... income bar coffee_house takeaway restaurant_lt_20 restaurant_gt_20 coupon_accepted direction coupon_accepted_str driving_time
24 No Urgent Place Friend(s) Sunny 80 10AM Bar 1d Male 21 Single ... 62500 - 74999 Never Never 4~8 4~8 Never 1 1 Yes 10
35 Home Alone Sunny 55 6PM Bar 1d Male 21 Single ... 62500 - 74999 Never Never 4~8 4~8 Never 1 0 Yes 10
39 Work Alone Sunny 55 7AM Bar 1d Male 21 Single ... 62500 - 74999 Never Never 4~8 4~8 Never 1 1 Yes 30
46 No Urgent Place Friend(s) Sunny 80 10AM Bar 1d Male 46 Single ... 12500 - 24999 Never 4~8 1~3 1~3 Never 0 1 No 10
57 Home Alone Sunny 55 6PM Bar 1d Male 46 Single ... 12500 - 24999 Never 4~8 1~3 1~3 Never 0 0 No 10

5 rows × 23 columns

  1. What proportion of bar coupons were accepted?
In [32]:
fig = px.histogram(df_bar['coupon_accepted_str'], x="coupon_accepted_str", histnorm='percent',
                   labels={'coupon_accepted_str':'Coupon Accepted'}, text_auto=True , title='Overall bar coupons acceptance')
fig.update_layout( yaxis_title="Percentage")

fig.show()
41.1918558.80815YesNo0102030405060
Overall bar coupons acceptanceCoupon AcceptedPercentage
plotly-logomark

A total of 41.19% bar coupons were accepted by drivers

  1. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.
In [33]:
#Compute acceptance rate as a % value; We are using the method defined earlier to compute this
bar_coupons_accepted_pct = get_acceptance_pct(df_bar, ['bar'])
bar_coupons_accepted_pct
Out[33]:
acceptance_pct coupon_issued coupon_accepted
bar
1~3 64.64 379 245
4~8 77.55 147 114
8 or more 71.74 46 33
Never 29.53 1341 396
In [34]:
#use a bar plot to visualize the acceptance based on bar visits

fig = px.histogram(df_bar, x=["bar"], color="coupon_accepted_str",
                   labels={'value':'Bar visits per month', 'coupon_accepted_str':'Coupon Accepted'},
                   title='Bar Coupons Acceptance based on bar visits').update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Count")

#Show acceptance percentage value on top of each bar 
fig.add_trace(go.Scatter(
    x=bar_coupons_accepted_pct.index, 
    y=bar_coupons_accepted_pct['coupon_issued'],
    text=bar_coupons_accepted_pct['acceptance_pct'],
    mode='text',
    textposition='top center',
    textfont=dict(
        size=18,
    ),
    showlegend=False
))

fig.show()
64.6477.5571.7429.538 or more4~81~3Never0200400600800100012001400
Coupon AcceptedYesNoBar Coupons Acceptance based on bar visitsBar visits per monthCount
plotly-logomark

This bar chart shows that drivers who never go to bars has a significantly less acceptance rate for the coupons issued for bar.

  1. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to all the others. Is there a difference?
In [35]:
#Plot bar chart to compare acceptance based on age
df_bar_gt1 = df_bar.query('bar != "Never"') 
fig = px.histogram(df_bar_gt1, x="age", color="coupon_accepted_str",
                   labels={'age':'Age of driver', 'coupon_accepted_str':'Coupon Accepted'},
                   title='Bar coupons acceptance based on driver age',
                   barmode='group',
                    text_auto=True).update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Count")
fig.show()
41101226120211242649373220114211722273237424752020406080100120
Coupon AcceptedYesNoBar coupons acceptance based on driver ageAge of driverCount
plotly-logomark

Driver over the age of 21 had a higher acceptance rate compared to younger drivers.

  1. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry.
In [36]:
#filter the bar dataset for users visit bar atleast once a month and have an occupation other than farming 
df_bar_visit_gt1 = df_bar.query("bar != 'Never'").query('occupation != "Farming Fishing & Forestry"')

#Group based on passenger alone
df_bar_visit_gt1_grp2 = get_acceptance_pct(df_bar_visit_gt1 ,['passenger','bar'] )
df_bar_visit_gt1_grp2


df_bar_visit_gt1_grp2
Out[36]:
acceptance_pct coupon_issued coupon_accepted
passenger bar
Alone 1~3 65.12 215 140
4~8 82.00 100 82
8 or more 73.08 26 19
Friend(s) 1~3 82.67 75 62
4~8 72.00 25 18
8 or more 68.75 16 11
Kid(s) 1~3 32.35 34 11
4~8 57.14 7 4
8 or more 100.00 1 1
Partner 1~3 58.18 55 32
4~8 66.67 15 10
8 or more 66.67 3 2
In [37]:
fig = px.histogram(df_bar_visit_gt1_grp2.reset_index(), x='bar', y='acceptance_pct' , color="passenger",
                   labels={'bar':'Bar visits per month', 'passenger':'Passenger'},
                   barmode='group', width=800, height=600, text_auto=True,
                   title='Bar Coupons Acceptance Rate based on Passenger').update_xaxes(categoryorder='total ascending')

fig.update_xaxes(tickangle=45)
fig.update_layout( yaxis_title="Acceptance %")
65.128273.0882.677268.7532.3557.1410058.1866.6766.671~34~88 or more020406080100
PassengerAloneFriend(s)Kid(s)PartnerBar Coupons Acceptance Rate based on PassengerBar visits per monthAcceptance %
plotly-logomark

In the bar chart above , we can ignore the green bar for 8 or more visit because there is a singular data point .

The acceptance rate is the least when drivers travel with kids.

In [38]:
df_bar_visit_gt2 = df_bar.query("bar != 'Never'").query('passenger != "Kid(s)"')

df_bar_visit_gt1_grp2 = get_acceptance_pct(df_bar_visit_gt2, ['occupation'])

fig = px.histogram(df_bar_visit_gt1_grp2.reset_index(), x='occupation', y='acceptance_pct' , 
                   labels={'occupation':'Occupation', 'passenger':'Passenger'},
                   barmode='group', width=1000, height=800, text_auto=True,
                   title='Bar Coupons Acceptance Rate based on Occupation').update_xaxes(categoryorder='total ascending')

fig.update_xaxes(tickangle=45)
fig.update_layout( yaxis_title="Acceptance %")
40505053.8555.5657.4558.826066.6767.1968.857576.9277.2778.5778.578080.2881.483.3385.7185.7190.91100Education&Training&LibraryBuilding & Grounds Cleaning & MaintenanceRetiredTransportation & Material MovingInstallation Maintenance & RepairComputer & MathematicalLegalBusiness & FinancialLife Physical Social ScienceStudentSales & RelatedPersonal Care & ServiceOffice & Administrative SupportArts Design Entertainment Sports & MediaFood Preparation & Serving RelatedHealthcare Practitioners & TechnicalProduction OccupationsUnemployedManagementCommunity & Social ServicesConstruction & ExtractionProtective ServiceHealthcare SupportArchitecture & Engineering020406080100
Bar Coupons Acceptance Rate based on OccupationOccupationAcceptance %
plotly-logomark

There are certain occupations where the acceptance is much higher like 'Architecture & Engineering or Healthcare Support'. Income might have a role to play in such cases

  1. Compare the acceptance rates between those drivers who:
  • go to bars more than once a month, had passengers that were not a kid, and were not widowed OR
  • go to bars more than once a month and are under the age of 30 OR
  • go to cheap restaurants more than 4 times a month and income is less than 50K.
In [39]:
#create a new data set for customers who go to bars more than once a month,
#had passengers that were not a kid, and were not widowed 

df2 = df_bar.query("bar!= 'Never'").query("passenger != 'Kid(s)'").query("marital_status != 'Widowed'")

df2_grp = df2.groupby(['passenger','marital_status'])[['coupon_accepted']].count()
df2_grp = df2_grp.rename(columns={'coupon_accepted':'coupon_issued'})
df2_grp['coupon_accepted'] = df2.groupby(['passenger','marital_status'])[['coupon_accepted']].sum()
df2_grp['acceptance_pct'] = round(df2_grp['coupon_accepted']/df2_grp['coupon_issued']*100,2)

df2_grp
Out[39]:
coupon_issued coupon_accepted acceptance_pct
passenger marital_status
Alone Divorced 10 9 90.00
Married partner 77 58 75.32
Single 189 137 72.49
Unmarried partner 65 37 56.92
Friend(s) Divorced 3 2 66.67
Married partner 22 16 72.73
Single 71 57 80.28
Unmarried partner 20 16 80.00
Partner Married partner 29 19 65.52
Single 7 5 71.43
Unmarried partner 37 20 54.05
In [40]:
fig = px.histogram(df2_grp.reset_index(), x='marital_status', y='acceptance_pct' , color="passenger",
                   labels={'marital_status':'Marital Status', 'passenger':'Passenger'},
                   barmode='group', width=800, height=600, text_auto=True,
                   title='Bar Coupons Acceptance Rate based on Occupation and Passenger').update_xaxes(categoryorder='total ascending')

fig.update_xaxes(tickangle=45)
fig.update_layout( yaxis_title="Acceptance %")
9056.9275.3272.4966.678072.7380.2854.0565.5271.43DivorcedUnmarried partnerMarried partnerSingle0102030405060708090
PassengerAloneFriend(s)PartnerBar Coupons Acceptance Rate based on Occupation and PassengerMarital StatusAcceptance %
plotly-logomark
In [41]:
#create a new data set for customers who go to bars more than once a month and are undere the age of 30

df_age_lt30= df_bar.query('bar != "Never"').query('age < 30')

df_age_lt30_grp = df_age_lt30.groupby(['bar','age'])[['coupon_accepted']].count()
df_age_lt30_grp = df_age_lt30_grp.rename(columns={'coupon_accepted':'coupon_issued'})
df_age_lt30_grp['coupon_accepted'] = df_age_lt30.groupby(['bar','age'])[['coupon_accepted']].sum()
df_age_lt30_grp['acceptance_pct'] = round(df_age_lt30_grp['coupon_accepted']/df_age_lt30_grp['coupon_issued']*100,2)

#df2_grp = df2_grp.reset_index()
df_age_lt30_grp
Out[41]:
coupon_issued coupon_accepted acceptance_pct
bar age
1~3 16 6 2 33.33
21 103 67 65.05
26 89 63 70.79
4~8 21 44 33 75.00
26 52 44 84.62
8 or more 16 4 2 50.00
21 12 10 83.33
26 18 15 83.33
In [42]:
#go to bars more than once a month and are under the age of 30 OR
fig = px.histogram(df_age_lt30_grp.reset_index(), x='age', y='acceptance_pct' , color="bar",
                   labels={'age':'Age', 'bar':'Bar visits per month'},
                   barmode='group', width=800, height=600, text_auto=True,
                   title='Bar Coupons Acceptance Rate based on age').update_xaxes(categoryorder='total ascending')

fig.update_layout( yaxis_title="Acceptance %")
33.3365.0570.797584.625083.3383.3317222701020304050607080
Bar visits per month1~34~88 or moreBar Coupons Acceptance Rate based on ageAgeAcceptance %
plotly-logomark
In [43]:
#create a new data set for customers who go to cheap restaurants more than 4 times a month and income is less than 50K
def income_lt_50k(row):
    income_range = row['income']
    income_split = income_range.split('-',2)
    max_income = int(income_split[1])
    return max_income < 50_000


df3 = df_bar[df_bar.apply(income_lt_50k, axis=1)] #Filter dataset based on income
df_bar3 = df3.groupby(['restaurant_lt_20','income'])[['coupon_accepted']].count()

df_bar3 = df_bar3.rename(columns={'coupon_accepted':'coupon_issued'})
df_bar3['coupon_accepted'] = df_bar.groupby(['restaurant_lt_20','income'])[['coupon_accepted']].sum()
df_bar3['acceptance_pct'] = round(df_bar3['coupon_accepted']/df_bar3['coupon_issued']*100,2)

df_bar3
Out[43]:
coupon_issued coupon_accepted acceptance_pct
restaurant_lt_20 income
1~3 0 - 12500 53 30 56.60
12500 - 24999 123 53 43.09
25000 - 37499 141 61 43.26
37500 - 49999 131 37 28.24
4~8 0 - 12500 58 20 34.48
12500 - 24999 72 30 41.67
25000 - 37499 57 25 43.86
37500 - 49999 50 20 40.00
8 or more 0 - 12500 15 12 80.00
12500 - 24999 24 11 45.83
25000 - 37499 17 8 47.06
37500 - 49999 40 26 65.00
Never 0 - 12500 33 11 33.33
12500 - 24999 52 20 38.46
25000 - 37499 85 42 49.41
37500 - 49999 26 7 26.92
In [44]:
#go to bars more than once a month and are under the age of 30 OR
fig = px.histogram(df_bar3.reset_index(), x='income', y='acceptance_pct' , color="restaurant_lt_20",
                   labels={'income':'Income', 'restaurant_lt_20':'Visits Restaurant < 20 '},
                   barmode='group', width=800, height=600, text_auto=True,
                   title='Bar Coupons Acceptance Rate based on Income').update_xaxes(categoryorder='total ascending')

fig.update_layout( yaxis_title="Acceptance %")
28.2443.0943.2656.64041.6743.8634.486545.8347.068026.9238.4649.4133.3337500 - 4999912500 - 2499925000 - 374990 - 1250001020304050607080
Visits Restaurant < 20 1~34~88 or moreNeverBar Coupons Acceptance Rate based on IncomeIncomeAcceptance %
plotly-logomark
  1. Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?
  • Drivers who visits bar more than 4 times a month are more likely to accept a bar coupon
  • Drivers who are above an age of 25 are more likely to accept the coupon
  • Drivers travelling alone or with friends are more likely to accept bar coupon
  • Drivers who tend to visit cheap restuarants over 8 times a month are more likely to accept the bar coupon

Investigating Coffee House coupons¶

Coffee house coupons have an overall acceptance of <50%. Lets analyze who are most likely to accept coffee house coupon

  1. Create a dataset containing coffee house coupons
In [45]:
df_coffee = df.query('coupon=="Coffee House"')
df_coffee
Out[45]:
destination passenger weather temperature time coupon expiration gender age marital_status ... income bar coffee_house takeaway restaurant_lt_20 restaurant_gt_20 coupon_accepted direction coupon_accepted_str driving_time
23 No Urgent Place Friend(s) Sunny 80 10AM Coffee House 2h Male 21 Single ... 62500 - 74999 Never Never 4~8 4~8 Never 0 1 No 10
26 No Urgent Place Friend(s) Sunny 80 2PM Coffee House 1d Male 21 Single ... 62500 - 74999 Never Never 4~8 4~8 Never 0 1 No 10
27 No Urgent Place Friend(s) Sunny 80 2PM Coffee House 2h Male 21 Single ... 62500 - 74999 Never Never 4~8 4~8 Never 0 1 No 20
28 No Urgent Place Friend(s) Sunny 80 2PM Coffee House 1d Male 21 Single ... 62500 - 74999 Never Never 4~8 4~8 Never 0 1 No 20
30 No Urgent Place Friend(s) Sunny 80 6PM Coffee House 2h Male 21 Single ... 62500 - 74999 Never Never 4~8 4~8 Never 0 1 No 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12656 Home Alone Snowy 30 10PM Coffee House 2h Male 31 Married partner ... 100000 - 200000 Never Never 4~8 8 or more Never 0 1 No 20
12659 Work Alone Snowy 30 7AM Coffee House 1d Male 31 Married partner ... 100000 - 200000 Never Never 4~8 8 or more Never 0 0 No 10
12674 Home Alone Rainy 55 10PM Coffee House 2h Male 26 Single ... 75000 - 87499 Never Never 1~3 4~8 1~3 0 0 No 10
12675 Home Alone Snowy 30 10PM Coffee House 2h Male 26 Single ... 75000 - 87499 Never Never 1~3 4~8 1~3 0 1 No 20
12681 Work Alone Snowy 30 7AM Coffee House 1d Male 26 Single ... 75000 - 87499 Never Never 1~3 4~8 1~3 0 0 No 10

3816 rows × 23 columns

  1. What proportion of coffee house coupons were accepted?
In [46]:
fig = px.histogram(df_coffee['coupon_accepted_str'], x="coupon_accepted_str", histnorm='percent',
                   labels={'coupon_accepted_str':'Coupon Accepted'}, text_auto=True , title='Coffee House Coupons acceptance Rate')
fig.update_layout( yaxis_title="Percentage")

fig.show()
50.3668849.63312NoYes01020304050
Coffee House Coupons acceptance RateCoupon AcceptedPercentage
plotly-logomark

A total of 49.6% of all issued coffee house coupons were accepted

  1. Does the frequency of visiting coffee house impacts acceptance rate of the coupon?
In [47]:
#Compute acceptance rate as a % value
coffee_coupons_accepted_pct = get_acceptance_pct(df_coffee, ['coffee_house'])
coffee_coupons_accepted_pct
#use a bar plot to visualize the acceptance rate based on coffee house visit freq

fig = px.histogram(df_coffee, x="coffee_house", color="coupon_accepted_str",
                   labels={'coffee_house':'Coffee House visits per month', 'coupon_accepted_str':'Coupons Accepted'},
                   title='Coffee House Coupons Acceptance based on coffee house visit freq').update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Coupons Count")

#Show acceptance percentage value on top of each bar 
fig.add_trace(go.Scatter(
    x=coffee_coupons_accepted_pct.index, 
    y=coffee_coupons_accepted_pct['coupon_issued'],
    text=coffee_coupons_accepted_pct['acceptance_pct'],
    mode='text',
    textposition='top center',
    textfont=dict(
        size=18,
    ),
    showlegend=False
))

fig.show()
64.7768.2465.7934.038 or more4~81~3Never0500100015002000
Coupons AcceptedNoYesCoffee House Coupons Acceptance based on coffee house visit freqCoffee House visits per monthCoupons Count
plotly-logomark

Acceptance rate of coupon is close to 65% when driver visits coffee house >=1 time a month

This suggests that drivers who visit coffee house more than once a month are more likely to accept the coffee house coupon compared to those who never visit coffee house

  1. Does Temperature or Weather impacts the acceptance of the coupon?
In [48]:
#Compute acceptance rate as a % value

coffee_coupons_accepted_pct = get_acceptance_pct(df_coffee, ['temperature','weather'])
coffee_coupons_accepted_pct
Out[48]:
acceptance_pct coupon_issued coupon_accepted
temperature weather
30 Snowy 42.81 285 122
Sunny 71.43 14 10
55 Rainy 51.63 215 111
Sunny 43.82 1004 440
80 Sunny 52.70 2298 1211

Acceptance rate varies between 44 - 52% for all the temperature. Sunny weather at 30F seems to have a higher acceptance rate , but the total coupons in this case is quite less and can not be considered as one of the factors. So temperature or weather doesnt seem to play a big role in coupon acceptance

  1. What about age, gender and education?
In [49]:
#Compute acceptance rate as a % value
coffee_coupons_accepted_pct = get_acceptance_pct(df_coffee, ['age','gender','education'])
#print(coffee_coupons_accepted_pct)

#Plot bar charts based on gender to show the comparison
fig = px.histogram(coffee_coupons_accepted_pct.reset_index(), x="age", y='acceptance_pct',color='education',
                   facet_col="gender",nbins=8,
                   labels={'age':'Age', 'gender':'Gender' , 'education':'Education'},
                    barmode='group', text_auto=True,width = 1200,
                   category_orders={"education":["High School Graduate","Bachelors degree"]},
                   title='Coffee House coupon acceptance based on gender, age and education').update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Acceptance %", bargap=0.3)
fig.show()
47.3754.3752.5552.655.8139.6843.8432.7166.6753.8953.8846.2132.0848.894542.5245.7652.8353.6262.526.6753.33203040500102030405060708081.9349.5558.8845.6159.2165.5768.424052.2345.0253.6639.3633.3332.3529.6362.1634.6240.4340.4375.6869.2320304050
EducationHigh School GraduateBachelors degreeGraduate degree (Masters or Doctorate)Coffee House coupon acceptance based on gender, age and educationAgeAgeAcceptance %Gender=FemaleGender=Male
plotly-logomark

There seems to be an almost equal divide for gender. Age wise distribution shows that Male aged 16 or younger are more likely to accept a coupon for coffee house compared to female of the same age

  1. Analyze destination and companion based acceptance rate

During univariate analysis, we saw that most of the drivers are headed to non urgent place with friends. Can this be a factor influencing the acceptance rate?

In [50]:
#Compute acceptance rate as a % value
coffee_coupons_accepted_pct = get_acceptance_pct(df_coffee, ['destination','passenger'])
coffee_coupons_accepted_pct
Out[50]:
acceptance_pct coupon_issued coupon_accepted
destination passenger
Home Alone 35.05 833 292
Partner 54.55 55 30
No Urgent Place Alone 57.68 449 259
Friend(s) 59.74 1175 702
Kid(s) 47.15 193 91
Partner 57.20 236 135
Work Alone 44.00 875 385
In [51]:
fig = px.histogram(coffee_coupons_accepted_pct.reset_index(), x='passenger', y='acceptance_pct' , color="destination",
                   labels={'destination':'Destination', 'passenger':'Passenger'},
                   barmode='group', width=800, height=600, text_auto=True,
                   title='Coffee House Coupons Acceptance Rate based on destination and passenger').update_xaxes(categoryorder='total ascending')

fig.update_layout( yaxis_title="Acceptance %")
54.5535.0547.1559.7457.257.6844Kid(s)Friend(s)PartnerAlone0102030405060
DestinationHomeNo Urgent PlaceWorkCoffee House Coupons Acceptance Rate based on destination and passengerPassengerAcceptance %
plotly-logomark

The data set does not have enough data points to be able to compare the acceptance rate based on these two features. The acceptance rate is almost the same for the data available. The plot does tell us however that a driver is more likley to accept a coupon driving to non urgent place than they are driving home alone.

  1. Analyze direction of travel based acceptance rate
In [52]:
#compute acceptance rate based on direction of travel from destination
coffee_coupons_accepted_pct_dir = get_acceptance_pct(df_coffee, ['direction','destination','coffee_house'])
coffee_coupons_accepted_pct_dir
Out[52]:
acceptance_pct coupon_issued coupon_accepted
direction destination coffee_house
0 Home 1~3 63.51 74 47
4~8 78.26 46 36
8 or more 56.67 30 17
Never 29.01 162 47
Work 1~3 74.47 94 70
4~8 81.48 54 44
8 or more 80.95 42 34
Never 38.32 214 82
1 Home 1~3 44.81 154 69
4~8 52.63 76 40
8 or more 46.30 54 25
Never 14.04 292 41
No Urgent Place 1~3 74.87 561 420
4~8 72.43 272 197
8 or more 73.56 174 128
Never 42.26 1046 442
Work 1~3 39.71 136 54
4~8 49.15 59 29
8 or more 50.00 42 21
Never 21.79 234 51
In [53]:
#use a bar plot to visualize the acceptance rate based on coffee house visit freq
#lets ignore the drivers who never visit coffee house since the acceptance rate is lesser there

fig = px.histogram(coffee_coupons_accepted_pct_dir.reset_index().query('coffee_house!="Never"'), x="destination", y='acceptance_pct',color="direction",
                   facet_col="coffee_house",
                   labels={'destination':'Destination', 'direction':'Direction from destination',
                           'gender':'Gender','coffee_house':'Coffee House visits' },
                    barmode='group', text_auto=True,
                   title='Coffee House Coupons Acceptance based on travel direction').update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Acceptance %")
fig.show()
74.4763.5174.8739.7144.81No Urgent PlaceWorkHome0102030405060708081.4878.2672.4349.1552.63No Urgent PlaceWorkHome80.9556.6773.565046.3No Urgent PlaceWorkHome
Direction from destination01Coffee House Coupons Acceptance based on travel directionDestinationDestinationDestinationAcceptance %Coffee House visits=1~3Coffee House visits=4~8Coffee House visits=8 or more
plotly-logomark

Here, direction = 0 represents going in the same direction whereas 1 represents opposite direction The bar shows that when travelling to home or work, the acceptance rate is higher as compared to when driving away from home or work.

The drivers are always returning to a non urgent place and have a high acceptance rate for that

  1. Analyze coupon attribute i.e. time to expire
In [54]:
#compute acceptance rate based on coupon attribute
coffee_coupons_accepted_pct_tte = get_acceptance_pct(df_coffee, ['expiration','has_children','coffee_house'])
coffee_coupons_accepted_pct_tte
Out[54]:
acceptance_pct coupon_issued coupon_accepted
expiration has_children coffee_house
1d 0 1~3 72.76 257 187
4~8 82.09 134 110
8 or more 85.26 95 81
Never 40.86 558 228
1 1~3 75.45 167 126
4~8 76.85 108 83
8 or more 76.79 56 43
Never 39.31 318 125
2h 0 1~3 59.58 381 227
4~8 59.41 170 101
8 or more 55.10 98 54
Never 29.39 667 196
1 1~3 56.07 214 120
4~8 54.74 95 52
8 or more 50.54 93 47
Never 28.15 405 114
In [55]:
fig = px.histogram(coffee_coupons_accepted_pct_tte.reset_index().query('coffee_house!="Never"'), x="expiration", y='acceptance_pct',color="has_children",
                   facet_col="coffee_house",
                   labels={'expiration':'Expiration', 'has_children':'Children',
                           'gender':'Gender','coffee_house':'Coffee House visits' },
                    barmode='group', text_auto=True,
                   title='Coffee House Coupons Acceptance based on coupon expiry duration').update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Acceptance %")
fig.show()
59.5872.7656.0775.452h1d0102030405060708059.4182.0954.7476.852h1d55.185.2650.5476.792h1d
Children01Coffee House Coupons Acceptance based on coupon expiry durationExpirationExpirationExpirationAcceptance %Coffee House visits=1~3Coffee House visits=4~8Coffee House visits=8 or more
plotly-logomark

Coupons with longer expiration (1day) have a higher acceptance rate compared to the ones expiring sooner.

Number of children doesnt make a big difference in the acceptance of coffee house coupon

  1. Analyze income based acceptance for drivers who visit coffee house atleast once a month
In [56]:
#compute acceptance rate based on income
coffee_coupons_accepted_pct_income = get_acceptance_pct(df_coffee.query('coffee_house!="Never"'), ['income','time'])
coffee_coupons_accepted_pct_income
Out[56]:
acceptance_pct coupon_issued coupon_accepted
income time
0 - 12500 10AM 69.44 36 25
10PM 56.52 23 13
2PM 92.00 25 23
6PM 60.98 41 25
7AM 74.36 39 29
100000 - 200000 10AM 85.45 55 47
10PM 41.18 17 7
2PM 67.35 49 33
6PM 56.58 76 43
7AM 49.18 61 30
12500 - 24999 10AM 86.44 59 51
10PM 58.33 12 7
2PM 73.91 69 51
6PM 65.48 84 55
7AM 55.07 69 38
25000 - 37499 10AM 78.33 60 47
10PM 45.45 22 10
2PM 77.61 67 52
6PM 63.89 72 46
7AM 59.72 72 43
37500 - 49999 10AM 87.93 58 51
10PM 47.62 21 10
2PM 78.26 46 36
6PM 60.00 75 45
7AM 75.93 54 41
50000 - 62499 10AM 79.66 59 47
10PM 52.94 17 9
2PM 57.14 56 32
6PM 53.03 66 35
7AM 52.63 57 30
62500 - 74999 10AM 65.52 29 19
10PM 16.67 6 1
2PM 75.86 29 22
6PM 50.00 42 21
7AM 60.00 35 21
75000 - 87499 10AM 100.00 10 10
10PM 44.44 9 4
2PM 63.64 11 7
6PM 54.55 22 12
7AM 35.71 14 5
87500 - 99999 10AM 80.56 36 29
10PM 60.00 10 6
2PM 77.42 31 24
6PM 58.54 41 24
7AM 57.69 26 15
In [57]:
fig = px.histogram(coffee_coupons_accepted_pct_income.reset_index(), x="income", y='acceptance_pct',color='time',
                   
                   labels={'income':'Income', 'time':'Time'},
                    barmode='group', text_auto=True,
                   title='Coffee House Coupons Acceptance based on driver income and time of the day',
                   category_orders={"income": ["0 - 12500", "12500 - 24999", '25000 - 37499','37500 - 49999','50000 - 62499', '62500 - 74999','75000 - 87499', '87500 - 99999'],
                                   "time":["7AM","10AM","2PM","6PM","10PM"]})
fig.update_layout( yaxis_title="Acceptance %")
fig.show()
74.3655.0759.7275.9352.636035.7157.6949.1869.4486.4478.3387.9379.6665.5210080.5685.459273.9177.6178.2657.1475.8663.6477.4267.3560.9865.4863.896053.035054.5558.5456.5856.5258.3345.4547.6252.9416.6744.446041.180 - 1250012500 - 2499925000 - 3749937500 - 4999950000 - 6249962500 - 7499975000 - 8749987500 - 99999100000 - 200000020406080100
Time7AM10AM2PM6PM10PMCoffee House Coupons Acceptance based on driver income and time of the dayIncomeAcceptance %
plotly-logomark

Irrespective of income, the acceptance rate seems similar across the income range Drivers driving around 10AM has the most acceptance rate comparatively

  1. Analyze based on visits to different establishments
In [58]:
coffee_coupons_accepted_pct_bar = get_acceptance_pct(df_coffee, ['bar' ] )
coffee_coupons_accepted_pct_bar

coffee_coupons_accepted_pct_tkaway = get_acceptance_pct(df_coffee, ['takeaway' ] )
coffee_coupons_accepted_pct_tkaway

coffee_coupons_accepted_pct_rstlt20 = get_acceptance_pct(df_coffee, ['restaurant_lt_20' ] )
coffee_coupons_accepted_pct_rstlt20

coffee_coupons_accepted_pct_rstgt20 = get_acceptance_pct(df_coffee, ['restaurant_gt_20' ] )
coffee_coupons_accepted_pct_rstgt20
Out[58]:
acceptance_pct coupon_issued coupon_accepted
restaurant_gt_20
1~3 51.63 1042 538
4~8 56.83 183 104
8 or more 63.74 91 58
Never 47.76 2500 1194
In [59]:
fig = go.Figure()
fig.add_trace(go.Bar(
    x=coffee_coupons_accepted_pct_bar.index,
    y=coffee_coupons_accepted_pct_bar['acceptance_pct'],
    name='Bar Visits',
    marker_color='indianred'
))
fig.add_trace(go.Bar(
    x=coffee_coupons_accepted_pct_tkaway.index,
    y=coffee_coupons_accepted_pct_tkaway['acceptance_pct'],
    name='Takeaway frequency',
    marker_color='lightsalmon'
))
fig.add_trace(go.Bar(
    x=coffee_coupons_accepted_pct_rstlt20.index,
    y=coffee_coupons_accepted_pct_rstlt20['acceptance_pct'],
    name='Restaurant < 20',
    marker_color='crimson'
))
fig.add_trace(go.Bar(
    x=coffee_coupons_accepted_pct_rstgt20.index,
    y=coffee_coupons_accepted_pct_rstgt20['acceptance_pct'],
    name='Restaurant > 20',
    marker_color='pink'
))
# Here we modify the tickangle of the xaxis, resulting in rotated labels.
fig.update_layout(barmode='group',xaxis_title='Visit frequency per month', yaxis_title="Acceptance %",
                  title='Coffee house coupon acceptance rate based on driver visit to other establishments')

fig.show()
1~34~88 or moreNever0102030405060
Bar VisitsTakeaway frequencyRestaurant < 20Restaurant > 20Coffee house coupon acceptance rate based on driver visit to other establishmentsVisit frequency per monthAcceptance %
plotly-logomark

It looks like the acceptance rate of coffee house coupon does not vary much based on how many times drivers visit other establishments

  1. Analyze based on marital status
In [60]:
coffee_coupons_accepted_pct_marital = get_acceptance_pct(df_coffee.query('coffee_house!="Never"'), ['marital_status','coffee_house' ] )
coffee_coupons_accepted_pct_marital
Out[60]:
acceptance_pct coupon_issued coupon_accepted
marital_status coffee_house
Divorced 1~3 60.00 20 12
4~8 62.50 32 20
8 or more 57.14 28 16
Married partner 1~3 63.36 393 249
4~8 72.46 167 121
8 or more 67.20 125 84
Single 1~3 68.59 417 286
4~8 65.49 226 148
8 or more 68.70 131 90
Unmarried partner 1~3 59.79 189 113
4~8 69.51 82 57
8 or more 60.34 58 35
In [61]:
fig = px.histogram(coffee_coupons_accepted_pct_marital.reset_index(), x="coffee_house", y='acceptance_pct',
                   color='marital_status',
                   labels={'income':'Income', 'marital_status':'Marital Status','coffee_house':'Coffee house visits per month'},
                    barmode='group', text_auto=True,
                   title='Coffee House Coupons Acceptance based on driver income')
fig.update_layout( yaxis_title="Acceptance %")
fig.show()
6062.557.1463.3672.4667.268.5965.4968.759.7969.5160.341~34~88 or more010203040506070
Marital StatusDivorcedMarried partnerSingleUnmarried partnerCoffee House Coupons Acceptance based on driver incomeCoffee house visits per monthAcceptance %
plotly-logomark

No difference based on marital status

  1. Will the driving time to the coupon location impact the acceptance rate?
In [62]:
coffee_coupons_accepted_pct_time = get_acceptance_pct(df_coffee.query('coffee_house!="Never"'), ['driving_time','coffee_house' ] )
coffee_coupons_accepted_pct_time
Out[62]:
acceptance_pct coupon_issued coupon_accepted
driving_time coffee_house
10 1~3 73.92 487 360
4~8 77.42 248 192
8 or more 67.68 164 111
20 1~3 59.38 448 266
4~8 60.63 221 134
8 or more 65.56 151 99
30 1~3 40.48 84 34
4~8 52.63 38 20
8 or more 55.56 27 15
In [64]:
fig = px.histogram(coffee_coupons_accepted_pct_time.reset_index(), x="coffee_house", y='acceptance_pct',
                   color='driving_time',
                   labels={'driving_time':'Driving Time to Coupon location', 'coffee_house':'Coffee house visits per month'},
                    barmode='group', text_auto=True,
                   title='Coffee House Coupons Acceptance based on driving time to coupon location')
fig.update_layout( yaxis_title="Acceptance %")
fig.show()
73.9277.4267.6859.3860.6365.5640.4852.6355.561~34~88 or more01020304050607080
Driving Time to Coupon location102030Coffee House Coupons Acceptance based on driving time to coupon locationCoffee house visits per monthAcceptance %
plotly-logomark

Easy to conclude that the lesser the driving time, the higher the acceptance rate of the coupon.

Findings¶

  • The drivers who accept coupons visit the establishment atleast once a month. We saw this for both bar as well as coffee house. This is irrrespective of age, gender or income.

  • Drivers are more likely to accept a coupon when the weather is warmer (80F ) as compared to colder temperatures

  • Drivers driving at 10 AM are most likely to accept a coupon

  • A male driver is most likely to accept coffee house coupon if they are 16 or younger. They are more likely to accept bar coupon if they are 21 or older

  • When a coupon location is shorter drive away, the likelyhood of it getting accepted is higher

  • Coupons with longer expiration (1day) have a higher acceptance rate compared to the ones expiring sooner.

  • A driver driving to a non urgent place is more likely to accept a coupon as compared to when driving to home or work

Next Steps¶

Analyze the coupons for restaurants and takeaways as well in a similar fashion and verify if the findings from coffee house and bar also applies to these categories.